Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(lint): Implement front matter validator and linter #8295

Closed
wants to merge 3 commits into from

Conversation

OnkarRuikar
Copy link
Contributor

@OnkarRuikar OnkarRuikar commented Feb 23, 2023

Summary

Problem

Front matter now hold many important and complex values. Without the automated validator it starts accumulating errors. And there is no tool to find and report/fix these issues.

Implement a custom linter that will satisfy following requirements:

  • validation:

    • Make some attributes, like title and slug, mandatory.
    • Validate page-type attribute. And allow only certain values based on technology/feature of the document.
    • Enforce data types/formats/lengths. For example, spec-urls need to be URI format.
  • prettier:

    • Indent yaml, remove trailing spaces, fix line widths etc.
    • Make sure attributes are in specific order: title -> slug -> page-type -> status -> browser-compat -> spec-urls
    • For attributes browser-compat and spec-urls support union data type string|array. If there is only one value then use string other wise use array for multivalue:
    # single value
    spec-urls: https://abcd.com/api.html
    
    # multiple values
    spec-urls:
      - https://abcd.com/api.html
      - https://xyz.com/property.html
  • automation:

    • Make a PR check workflow that will flag validation errors. I.e., option to fix specified list of files.
    • Daily chron job workflow that will pretty and fix order. Just like the current daily markdownlint-fix workflow.

Solution

Implement a custom solution similar to the filecheck tool.
Convert front matter YAML to JSON object and validate using JSON Schema which also supports the required union type. Following are the required steps:

  • define JSON schema for front matter. See front-matter-config.json.
  • use js-yaml library to convert yaml to json
  • use ajv library to validate the schema
  • using custom logic validate the attribute order
  • pretty print yaml using js-yaml library

To run the tool use following commands:

# check few files
yarn tool fmlint  "/.../index.md " "/.../index.md"

# pretty print and auto fix front matter
yarn tool fmlint --fix true "/.../index.md"

# check all files
yarn tool fmlint > ~/logs/fm_issues.txt

# auto pretty all files
yarn tool fmlint --fix true > ~/logs/fm_issues.txt

Let me know about following possiblities:

  • move front-matter-config.json to mdn/content repo, so that configuration won't be attached to yari
  • for each value a regular expression can be specified for tighter control
  • maxLength for title is set to 150 chars in the PR, what is the required value?

Screenshots

yarn tool fmlint /home/koder/GitHub/mdn/content/files/en-us/glossary/aria/index.md /home/koder/GitHub/mdn/content/files/en-us/web/svg/element/a/index.md > ~/_temp/fmfaults.txt 
progress [========================================] 100% | ETA: 0s | 2/2

error: Error: ../content/files/en-us/glossary/aria/index.md
       'page-type' property must be equal to one of the allowed values: 
       	glossary-definition, glossary-disambiguation
       
       Error: ../content/files/en-us/web/svg/element/a/index.md
       Front matter must have required property 'slug'
       'page-type' property must be equal to one of the allowed values: 
       	guide, landing-page
       
       Following fixable errors can be fixed using '--fix true' option
       ../content/files/en-us/web/svg/element/a/index.md
       	 Front matter attributes are not in required order: title->slug->page-type->status->browser-compat->spec-urls

error Command failed with exit code 1.                                     

How did you test this change?

@github-actions github-actions bot added the dependencies Pull requests that update a dependency file label Feb 23, 2023
@OnkarRuikar OnkarRuikar force-pushed the front_matter_validator branch from 857ce9f to ac1a07b Compare February 23, 2023 15:02
@github-actions github-actions bot removed the dependencies Pull requests that update a dependency file label Feb 23, 2023
@OnkarRuikar OnkarRuikar reopened this Feb 23, 2023
@github-actions github-actions bot added the dependencies Pull requests that update a dependency file label Feb 23, 2023
@caugner caugner self-requested a review February 23, 2023 15:06
@caugner caugner added the wx writer experience label Feb 23, 2023
@OnkarRuikar OnkarRuikar force-pushed the front_matter_validator branch from f512b75 to 18a8f5d Compare February 23, 2023 15:09
@caugner
Copy link
Contributor

caugner commented Feb 23, 2023

@OnkarRuikar Thank you for this. Could we add this as an fmlint command to tool/cli.ts?

Maybe you could use a pattern like this to avoid defining the command inside cli.ts:

export function addRedirectCommand(program: Program) {
return program
.command("add-redirect", "Add a new redirect")
.argument("<from>", "From-URL")
.argument("<to>", "To-URL")
.action(
tryOrExit(({ args, logger }: AddRedirectActionParameters) => {
const from = new URL(args.from).pathname;
const to = new URL(args.to).pathname;
const locale = from.split("/")[1];
Redirect.add(locale, [[from, to]]);
logger.info(chalk.green(`Saved '${from}' → '${to}'`));
})
);
}

edit: Have you tried out the validator/linter on translated-content yet? As it makes it easier, we could run it on mdn/content only to start with. In translated-content, there are some additional frontmatter keys, and some that we don't want to duplicate from content.

@OnkarRuikar
Copy link
Contributor Author

Could we add this as an fmlint command to tool/cli.ts

done

As it makes it easier, we could run it on mdn/content only to start with.

I already did. As per the config, there were only 3 validation issues. And there are 350+ files that need prettying and attribute order fix.

Currently we are in process of removing tags attribute from content front matter. It'll be easy to run the tool after the activity without creating conflicts with those PRs.

In translated-content, there are some additional frontmatter keys, and some that we don't want to duplicate from content.

I haven't run this on translated content. We'll have to define separate schema in front-matter-config.json for translated content. It'll solve the different attributes issue.


@teoli2003 , @wbamberg let me know if this configuration ok to address the #7900 (comment).

@wbamberg
Copy link
Collaborator

This is so great.

@teoli2003 , @wbamberg let me know if this configuration ok to address the #7900 (comment).

I think one day we might want to have the set of page types in a separate file, to make it easier to reference from other places. But as it stands this is really easy to read and update, and it'll be great to have validation for this.

I saw that you have "global" but also list the global entries in each section. Is there a reason to have both?

@OnkarRuikar
Copy link
Contributor Author

and it'll be great to have validation for this.

The code does validate page-type:

Error: ../content/files/en-us/glossary/aria/index.md
       'page-type' property must be equal to one of the allowed values: 
       	glossary-definition, glossary-disambiguation

The code alters the schema based on document location to inject allowed page types.

I saw that you have "global" but also list the global entries in each section. Is there a reason to have both?

The MDN and Glossary trees don't have guide or landing-page values: "MDN/": ["landing-page", "mdn-community-guide", "mdn-writing-guide"]. So someone might use guide type instead of mdn-community-guide in MDN tree. So I used explicit lists.
But "global" got used for the trees which are not mentioned in the config. I think this is not correct, and we need to explicitly specify page types for each tree. For example, till we don't specify page types for mathml in the config, the validator shouldn't allow page-type attribute itself on those pages. Suggesting global in such case will force people to use wrong ones and when we actually come up with page types for mathml then the validator won't complain cos global types will be treated as correct.

@wbamberg
Copy link
Collaborator

and it'll be great to have validation for this.

The code does validate page-type:

Sorry, I didn't mean to imply that it didn't. I mean, "it is great that this PR will give us validation for page type".

Error: ../content/files/en-us/glossary/aria/index.md
       'page-type' property must be equal to one of the allowed values: 
       	glossary-definition, glossary-disambiguation

The code alters the schema based on document location to inject allowed page types.

I saw that you have "global" but also list the global entries in each section. Is there a reason to have both?

The MDN and Glossary trees don't have guide or landing-page values: "MDN/": ["landing-page", "mdn-community-guide", "mdn-writing-guide"]. So someone might use guide type instead of mdn-community-guide in MDN tree. So I used explicit lists.

I think this is an error and these pages should just use "guide".

But "global" got used for the trees which are not mentioned in the config. I think this is not correct, and we need to explicitly specify page types for each tree. For example, till we don't specify page types for mathml in the config, the validator shouldn't allow page-type attribute itself on those pages. Suggesting global in such case will force people to use wrong ones and when we actually come up with page types for mathml then the validator won't complain cos global types will be treated as correct.

That's a good point. I guess once we are done, we will be able to change this.

@caugner
Copy link
Contributor

caugner commented Feb 24, 2023

I think this is an error and these pages should just use "guide".

I feel like using "guide" for the MDN meta docs overloads the page-type, because those (writer) guides/guidelines are probably distinct from the (user) guides. Are they not?

@teoli2003
Copy link
Contributor

page-type: meta-XYZ so that we can separate easily meta documents?

@wbamberg
Copy link
Collaborator

wbamberg commented Apr 12, 2023

page-type: meta-XYZ so that we can separate easily meta documents?

Actually yes, I agree with this. I don't think these meta-docs should be in MDN at all, so using a different page-type would at least enable people to ignore them. So:

meta-landing-page
meta-guide

?

But, let's not block this great PR on that discussion.

wbamberg
wbamberg previously approved these changes Apr 12, 2023
@teoli2003
Copy link
Contributor

Yes, there is no reason to block this PR on this: there will be additional page types in the future.

@wbamberg wbamberg dismissed their stale review April 12, 2023 18:27

Reconsidering

@wbamberg
Copy link
Collaborator

After discussing with @teoli2003 :

move front-matter-config.json to mdn/content repo, so that configuration won't be attached to yari

yes, I think this at least is quite important (it means we can avoid having to coordinate PRs in 2 repos if we change things). But also, is there a reason this validation is in Yari at all? We are validating the content here, can we not have this validation in mdn/content CI?

@OnkarRuikar OnkarRuikar force-pushed the front_matter_validator branch from d058166 to 0f8072b Compare April 13, 2023 07:37
@OnkarRuikar
Copy link
Contributor Author

OnkarRuikar commented Apr 13, 2023

But also, is there a reason this validation is in Yari at all? We are validating the content here, can we not have this validation in mdn/content CI?

This way makes it available in various contexts like in husky, translated-content(CI & husky), from yari(with .env set it'll lint both content and translated-content), and manual executions.

move front-matter-config.json to mdn/content repo, so that configuration won't be attached to yari

yes, I think this at least is quite important (it means we can avoid having to coordinate PRs in 2 repos if we change things).

Only thing remains is to decide on line length. With current 80 chars llimit, 54 files will have changes like below. Following is the most extreme one:

---
title: "ContentVisibilityAutoStateChangeEvent: ContentVisibilityAutoStateChangeEvent() constructor"
short-title: ContentVisibilityAutoStateChangeEvent()
slug: Web/API/ContentVisibilityAutoStateChangeEvent/ContentVisibilityAutoStateChangeEvent
page-type: web-api-constructor
browser-compat: api.ContentVisibilityAutoStateChangeEvent.ContentVisibilityAutoStateChangeEvent
---

to

---
title: >-
  ContentVisibilityAutoStateChangeEvent: ContentVisibilityAutoStateChangeEvent()
  constructor
short-title: ContentVisibilityAutoStateChangeEvent()
slug: >-
  Web/API/ContentVisibilityAutoStateChangeEvent/ContentVisibilityAutoStateChangeEvent
page-type: web-api-constructor
browser-compat: >-
  api.ContentVisibilityAutoStateChangeEvent.ContentVisibilityAutoStateChangeEvent
---

If we are ok with current config then I'll create a PR in content.

@github-actions github-actions bot added the merge conflicts 🚧 Please rebase onto or merge the latest main. label Apr 17, 2023
@github-actions
Copy link
Contributor

This pull request has merge conflicts that must be resolved before it can be merged.

@wbamberg
Copy link
Collaborator

But also, is there a reason this validation is in Yari at all? We are validating the content here, can we not have this validation in mdn/content CI?

This way makes it available in various contexts like in husky, translated-content(CI & husky), from yari(with .env set it'll lint both content and translated-content), and manual executions.

But the set of permitted/mandatory keys is different in mdn/content and in mdn/translated-content. I think it would be OK to have the machinery that validates content against the config to be in Yari, but I think the config should be alongside the content.

@OnkarRuikar OnkarRuikar marked this pull request as draft April 25, 2023 01:39
@OnkarRuikar
Copy link
Contributor Author

But the set of permitted/mandatory keys is different in mdn/content and in mdn/translated-content. I think it would be OK to have the machinery that validates content against the config to be in Yari, but I think the config should be alongside the content.

I see what you mean. As the structure and features in content and translated-content are different, it will be hard to write/maintain perfect code that will work with both. There are going to be a lot of 'if's and 'else's e.g. if translated-content path then do this, if content path then ignore that etc.

@wbamberg and @teoli2003 may I move this entire utility(code+config) to mdn/content/scripts? Once it starts working there I'll write a separate one for translated -content. Also, piggybacking this in yari is taking a long time to merge.

@OnkarRuikar
Copy link
Contributor Author

This feature has been moved to mdn/content repo.

@kaz211111
Copy link

"transform"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file merge conflicts 🚧 Please rebase onto or merge the latest main. wx writer experience
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Required linter/validator for frontmatter
5 participants